Reducing the Number of Canonical Form Tests for Frequent Subgraph Mining

نویسندگان

  • Andrés Gago Alonso
  • Jesús Ariel Carrasco-Ochoa
  • José Eladio Medina-Pagola
  • José Francisco Martínez Trinidad
چکیده

Frequent connected subgraph (FCS) mining is an interesting problem with wide applications in real life. Most of the FCS mining algorithms have been focused on detecting duplicate candidates using canonical form tests. Canonical form tests have high computational complexity, and therefore, they affect the efficiency of graph miners. In this paper, we introduce novel properties to reduce the number of canonical form tests in FCS mining. Based on these properties, a new algorithm for FCS mining called gRed is presented. The experimentation on real world datasets shows the impact of the proposed properties on the efficiency of gRed reducing the number of canonical form tests regarding gSpan. Besides, the performance of our algorithm is compared against gSpan and other state-of-the-art algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new algorithm for mining frequent connected subgraphs based on adjacency matrices

Most of the Frequent Connected Subgraph Mining (FCSM) algorithms have been focused on detecting duplicate candidates using canonical form (CF) tests. CF tests have high computational complexity, which affects the efficiency of graph miners. In this paper, we introduce novel properties of the canonical adjacency matrices for reducing the number of CF tests in FCSM. Based on these properties, a n...

متن کامل

On Speeding up Frequent Approximate Subgraph Mining

Frequent approximate subgraph (FAS) mining has become an interesting task with wide applications in several domains of science. Most of the previous studies have been focused on reducing the search space or the number of canonical form (CF) tests. CF-tests are commonly used for duplicate detection; however, these tests affect the efficiency of mining process because they have high computational...

متن کامل

Graph Mining: Repository vs. Canonical Form

In frequent subgraph mining one tries to find all subgraphs that occur with a user-specified minimum frequency in a given graph database. The basic approach is to grow subgraphs, adding an edge and maybe a node in each step, to count the number of database graphs containing them, and to eliminate infrequent subgraphs. The predominant method to avoid redundant search (the same subgraph can be gr...

متن کامل

Duplicate Candidate Elimination and Fast Support Calculation for Frequent Subgraph Mining

Frequent connected subgraph mining (FCSM) is an interesting task with wide applications in real life. Most of the previous studies are focused on pruning search subspaces or optimizing the subgraph isomorphism (SI) tests. In this paper, a new property to remove all duplicate candidates in FCSM during the enumeration is introduced. Based on this property, a new FCSM algorithm called gdFil is pro...

متن کامل

Frequent Subgraph Discovery

Over the years, frequent itemset discovery algorithms have been used to solve various interesting problems. As data mining techniques are being increasingly applied to non-traditional domains, existing approaches for finding frequent itemsets cannot be used as they cannot model the requirement of these domains. An alternate way of modeling the objects in these data sets, is to use a graph to mo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computación y Sistemas

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2011